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1. The puzzle of polycentric functionality 


Put succinctly, a polycentric governance structure is one that involves multiple decision-making units, each with authority 
over a specified, but evolving jurisdiction, which interact in various ways, according to a set of overarching rules (Aligica and 
Tarko, 2012). This structure typically engenders diverse forms of competition (Stephan et al., 2019); for instance, jurisdictions, 
which may be territorial or non-territorial, often compete for members. 

At the same time, many social scientists have documented the surprising functionality of polycentric governance struc- 
tures. Their desirable attributes include greater resilience (Carlisle and Gruby, 2019), as well as a greater capacity for coor- 
dination (Tarko, 2022), information processing, and adaptation (Andersson and Ostrom, 2008). 

Together, these features - decentralized competition and high functionality - produce a puzzle: how does decentral- 
ized competition generate beneficial social outcomes? Although invisible hand processes often produce a surprising degree 
of order, they typically presuppose a certain institutional background. Decentralization and competition do not, in general, 
guarantee functional social order (Wilson, 2016). Without the right institutions the invisible hand is liable to become an 
“invisible fist” (Anomaly and Brennan, 2014).! Only under the right institutional structure do “invisible hand processes” in- 
centivize local actions that produce global benefits.” Why is it, then, that polycentric systems often support high-quality 


* I would like to thank David Schmidtz, Thomas Christiano, Vlad Tarko, and Justin Bruner for many helpful discussions on the topic of this paper. Peter 
Boettke, Cameron Harwick, and Abigail Devereaux offered insightful comments when I presented earlier versions of this paper at the 2022 Markets & 
Society Conference and the 2021 Meeting of the Southern Economic Association. I'd also like to thank Mario Ivan Juárez Garcia and Matthew Jeffers for 
their reliable willingness to discuss premature versions of my arguments. 

E-mail address: Schaefer.alexander@nyu.edu 

1 See also Martin and Storr (2008). 

2 Elinor Ostrom expressed this issue by referencing the possibility of “local tyrranies” that might arise in self-governing institutional structures 
(Ostrom, 2009, 282-3). 
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governance? Put more strongly, how can decentralized competition give rise to beneficial order, rather than repugnant equi- 
libria or disorderly chaos? 

This paper advances our understanding of polycentric governance by drawing on multilevel selection theory. This theory 
is arguably our best tool for understanding the relationship between lower level rationality and emergent functionality in 
unplanned, decentralized systems. It has already been fruitfully applied to identify precise conditions under which group- 
beneficial adaptations are likely to emerge (Henrich, 2004), but experts on polycentricity have yet to examine the conceptual 
connection between multilevel selection theory and theories of polycentricity.’ In an effort to construct this bridge, the 
paper begins by laying out the formal framework of multilevel selection theory, encapsulated in two formulas derived from 
Price’s equation (Section 2).* It then identifies the crucial features - here called “functionality desiderata” - that a system 
must exhibit in order to achieve group-level success. Through the lens of this formal framework, the next section (Section 3) 
shows how various features of polycentric governance support collective functionality. To clarify this abstract treatment, 
Section 3 also examines a concrete example of polycentric governance: the scientific research community. The multilevel 
selection framework developed in Sections 2 and 3 illuminates the capacity of polycentric governance structures to evolve 
rules and norms that benefit communities, while suppressing opportunistic behavior. 


2. The price equation and multilevel selection 
2.1. The basic price equation 


When systems evolve as a result of the myopic choices of their members, this generally results “in a breakdown of group- 
level functional organization” (Wilson, 38). As economists and biologists have long known, lower-level rationality does not 
automatically scale; social dilemmas abound, leading to suboptimal outcomes for economies and species. One solution is 
for social planners to direct the behavior of individuals towards well-defined goals, but this solution works well only under 
special conditions. In more realistic settings, institutional frameworks must allow for decentralized decision-making, and 
polycentric governance structures offer an extreme version of such frameworks. Given the high degree of local autonomy, 
we must wonder how polycentric governance structures are able to secure desirable outcomes at the global level. Individuals 
within various jurisdictions do not, in general, possess the knowledge or the desire to produce group-beneficial outcomes. 
Yet, somehow, polycentric political organization often promotes effective governance (Thiel et al., 2019). What features of 
polycentric organization underlie this surprising capacity? 

To understand how a decentralized network of decision-making entities can produce and maintain beneficial social order, 
it will help to draw on multilevel selection theory. In particular, two versions of the Price equation provide deep insight into 
the conditions in which such social order will arise and persist. 

First, we need to establish some notation: 


e z: A (phenotypic) trait of some kind, which can be measured (discretely or continuously) with real numbers. 
+ zi: The level of z exhibited by the entity i e P = {1,..., n}. 
e Z: The average level of z in the P-population. 
+ Mathematically, Z = } 2, Zi 
e Azi: The change in the level of z from one entity to its offspring. 
e Taking an average across all entities in P, the expectation of Az; is E[ Az] = DD AZi 
e w;: The “fitness” or average number of offspring produced by entity i. 
+ Similarly to Z, we define the average fitness as w= } 7, wi 
e Often we will be interested, not in the absolute fitness of an entity, but in its fitness relative to the other entities in 
Wi 


the group. For this comparative purpose, we define relative fitness of entity i as y; = ṣẸ- 


With this notation in hand, a form of the Price equation can be written as follows:° 
Az = Cov(y,z) + E[y Az]. (1) 


Although simple, (Price Eq.) reveals the key components of evolutionary change. It decomposes the total change in Z, that 
is, the total change in the average level of the z-trait, into two components: Cov(y,z) and E[y Az]. Cov(y,z) denotes the 
Statistical association between y, relative fitness, and z, the level of the z-trait. When these two variables move together, 
that is when greater relative fitness is associated with higher levels of z, the covariance will be positive: Cov(y,z) > 0. When 
they are totally unrelated, Cov(y,z) = 0. And when they move in opposite directions, e.g. higher levels of z-trait go along 
with lower relative fitness, the covariance will be negative: Cov(y,z) < 0. Therefore, this first term is often identified with 
the evolutionary force of selection (Gardner, 2008). 


3 Wilson et al. (2013) have applied multilevel selection to explain how Elinor Ostrom’s eight “design principles” (Ostrom, 2016) support successful 
outcomes, but do not directly address the relationship between polycentricity and multilevel selection. 

4 The Price equation approach is not the only approach to understanding multilevel selection. Okasha (2006), for example, presents a “contextual ap- 
proach” sometimes provides a better causal decomposition of the forces at play in multilevel selection. However, the Price approach has proven quite 
successful at illuminating social evolutionary processes (Turchin, 2011). 

5 The subscript i has been dropped on all terms that occur within an expected value of a covariance operation, since this operation is performed across 
the entire population. See Appendix A.1 for a derivation with more precise notation. 


266 


A. Schaefer Journal of Economic Behavior and Organization 210 (2023) 265-287 


The second quantity on the right-hand side of Eq. (Price Eq.), E[y Az], denotes the weighted average of the change in z- 
levels between parents and offspring. Each entity i € P exhibits level z; of the z-trait and produces some number of offspring 
w; with a level z; of the z-trait. If we want to measure the average transmission rate of the z-trait, it makes sense to take 
the average change between parents and offspring, but for a more accurate measure, we should also heavily weight those 
entities with many offspring and discount those with few offspring. E[y Az] achieves this weighting by multiplying each 
Az; by the relative fitness of entity i, which is w;/w = y;.° In short, E[y Az] gives us a measure of the population’s overall 
transmission bias or copying fidelity. If all entities in P produce offspring with higher levels of z-trait, then E[y Az] > 0. If 
they all tend to produce entities with lower levels of z-trait, then E[y Az] < 0. When the transmission is perfect, i.e. z; = Z; 
for all i e P, then E[y Az] = 0.’ In that case, the only evolutionary force in operation is selection. 

To achieve some intuition for these terms, let us consider a specific interpretation.® In this interpretation, we consider 
two periods, one where entities compete for resources, and another in which they reproduce. We also assume that the z- 
trait exerts direct causal force in determining the survival of these entities, thus indirectly affecting the expected number of 
offspring that entities produce. In this simple scenario, Cov(y,z) tells us how well the z-trait promotes survival in the first 
period. Some entities will die, some will survive, and Cov(y,z) tells us how much the z-trait has contributed to survival 
ability. In the second period, when the entities reproduce, E[y Az] adds a further change by representing the amount of the 
z-trait inherited by the offspring. 

This can be made more concrete with some illustrative examples. The first is a standard biological example, while the 
second example concerns the cultural trait of adhering to a social norm. 


1. Consider a population N = {1,...,n} of polar bears. For each polar bear i €e N, we can measure the heaviness (and cor- 
responding warmth) of its fur as some number z; € (0,1), where O corresponds to no fur at all and 1 represents the 
heaviest possible coat that a polar bear could physically grow. For a population that inhabits its environment of evolu- 
tionary adaptation, we can assume that each ie N will have z; close to the optimal level of z, call it z*. This value is 
optimal in the sense that it maximizes the survival and reproduction of a polar bear. Now suppose there has been some 
climatic shift, a new ice age has set in, and every bear in N is now insufficiently insulated against the cold. In this case, 
polar bears with higher levels of z will survive longer and produce more offspring than polar bears with lower levels. 
Hence, there will be a positive correlation between the heaviness of a bear’s coat, z, and its relative fitness, y. This means 
that Cov(y,z) > 0 in the relevant range. If the heaviness of a bear’s coat is perfectly inherited by its offspring, then this 
is the end of the story. However, for some genetic reason, the heaviness of fur may not be perfectly inherited. If copying 
errors tend to produce lighter coats, then E[y Az] < 0. If they tend to produce heavier coats, then E[y Az] > 0. And if the 
error is symmetrically distributed, then E[y Az] = 0. The Price equation therefore decomposes the evolutionary process 
into two distinct effects: (1) selection and (2) inheritance. 

2. In an article on the evolution of social norms, Ostrom (2014) distinguishes between rational egoists and norm-following 
cooperators. Rational egoists will maximize their material holdings in any strategic interaction, while norm-following 
cooperators follow a rule of initiating cooperation when they estimate that others will reciprocate.’ Let our population 
P= {1,...,m} contain a mix of both types, and let z represent the types so that z; = 0 if agent i is a rational egoist, and 
z; =1 if agent i is a norm-following cooperator.!° To determine which type of agent will have higher relative fitness, 
y, we must know something about the type-distribution within the community of interaction. As Ostrom (2014, 243-4) 
tells us, following norms will be advantageous “so long as almost everyone reciprocates. If a small group of users identify 
each other, they can begin a process of cooperation.” This assumes that cooperators will be able to form a network of 
cooperators, excluding rational egoists. If we assume that materially successful agents have an advantage in passing on 
their norms, then in a population with this sort of network structure, Cov(y,z) > 0. However, in a different network 
structure, one where there is no way to exclude rational egoists, norm-following cooperators will likely fall prey to 
opportunistic exploitation, and Cov(y,z) < 0.'! Like in the polar bear example, if norms can be taught or inherited with 
perfect accuracy, or if the errors are the same for both traits, then this is the end of the story. If we suppose, on the other 
hand, that the cooperative norm is harder to teach accurately, then E[y Az] < 0. But if it’s easier to teach accurately, then 
E[y Az] > 0. As in the biological example, the Price equation again decomposes the evolutionary process into the distinct 
effects of selection and copying error. 


This decomposition seems rather straightforward: some part of evolution will be due to the relation between fitness and 
the z-trait, and some part of it will be due to the ability of entities to actually pass the z-trait on to their offspring. We 
might think of the first term, Cov(y,z), as representing the basic evolutionary idea that if a trait helps an entity survive 


6 A simpler, but slightly more technical, way of understanding E[y Az] construes it as the expected value of random variable Az whose probability 
distribution assigns a probability of w;/w to each Azj. 

7 This also occurs when copying errors are random and symmetric about the mean. 

8 This is Okasha’s “temporal interpretation” (Okasha, 2006, 24). 

9 Ostrom (2014, 238) also introduces a third type: “willing punishers.” I set these aside until the discussion of punishment in Section 3.2. 

10 Notice that the mathematics of the Price equation operate similarly for discrete traits, like this one, and for continuous traits, as in the polar bear 
example. 

1 Again, we are ignoring the meta-norm of punishment until later in the paper (Section 3.2) 


267 


A. Schaefer Journal of Economic Behavior and Organization 210 (2023) 265-287 


and reproduce, it will proliferate. The second term, E[y Az], adds the obvious qualification that there are sometimes copying 
errors between generations. The offspring do not inherit the exact level of z exhibited by their parents. 
The standard way of writing the Price equation takes Eq. (Price Eq.) and multiplies through by w to yield: 


WAZ = Cov(w, z) + E[wAz] (2) 


Notice that the only real change is that y has been replaced with w. This is just because y; = w;/w, and we multiplied 
through by w. From this basic form of the Price equation, we can derive a simple expression that models multilevel selection. 


2.2. The cultural, multilevel price equation 


The Price equation purports to be an entirely general description of any evolutionary process (Frank, 1998, 13). In theory, 
then, it should provide a way of modeling the process of cultural, multilevel selection, e.g. institutional evolution within a 
polycentric political framework. To see this, we first need to offer an interpretation of Eq. (2). Suppose we have a set of 
groups indexed by j e {1,..., N}. For any individual i in group j, let z; e {0,1} represent adherence to a rule or institutional 
feature that is “altruistic” in a technical sense. That is, z; represents adherence (1) or non-adherence (0) to a rule that 
promotes the fitness of the group j (defined as the average individual fitness within group j), but which decreases that of 
the individual i within group j. Accordingly, zj € (0,1) will represent the average level of the altruistic trait exhibited by 
individuals within group j, and the variable w; indicates the average fitness within group j. In other words w; indicates the 
number of people that are “influenced” by an average member of group j. 

In the cultural version of the Price equation, the notion of “influence” replaces that of reproduction. This is because 
(unlike DNA) rules, norms, ideas, and other cultural replicators can spread without organisms producing offspring. Though 
having more offspring may lead to more copies of a rule, increased copying of a rule does not entail increased offspring. 
Instead, wj; measures the average number of other individuals that will copy a member i of group j by imitating group i's 
rule level z;. This marks a major departure from models of genetic evolution, and for the remainder of this paper, it will be 
crucial to keep in mind that “fitness” refers to cultural fitness - i.e. the ability to influence others in their selection of rules 
of conduct. 

Another difference between biological evolution and cultural evolution concerns the criterion of selection. Setting aside 
intentional domestication and attempts at eugenic planning, selection in the biological realm occurs without foresight or 
conscious intent. Mutations happen randomly, and an organism survives and produces offspring without regard to the effect 
it has on population genetics. A genetic sequence is the byproduct of choices made without regard to their effects on the 
genome. By contrast, as Elinor Ostrom emphasizes when discussing institutional evolution, “[i]nstead of blind variation, 
human agents...use reason and persuasion in their efforts to devise better rules” (Ostrom, 1999, 524). Thus, cultural variation 
and selection involves conscious, rational choice. While random chance does play some role - “the process of choice always 
involves experimentation” (Ostrom, 1999, 524) - we often change our rules because a different rule seems to work better 
than the one that’s currently in place. 

With these two differences in mind, consider an example to illustrate the process of cultural evolution.!? Let z; = 1 
represent i adhering to a norm of working hard in group endeavors. As an individual, i could improve her situation by 
violating this rule and choosing z; = 0, a life inspired by the “beach boys” or the “flower children” Buchanan (1994, 28). 
Adherence to the work ethic benefits the members of i’s group, and an unwillingness to adhere degrades their quality of 
life. Consider two groups, j and k. Group j mostly adheres to the work ethic norm, only 20% of its members are beach bums 
(z; = .8). Group k rejects the importance of work ethic; only 20% of its members feel an obligation to work hard (zę = .2). 
Recalling the Elinor Ostrom’s point that cultural evolution is often driven by conscious choice, suppose that each individual 
follows a policy of “success bias” or copying norms that seem to promote success.'? The payoff to an industrious individual 
(“altruistic”) i in group m= j,k will be denoted wu”, while the payoff of shirking (“egoist”) will be denoted uj". Because 
laziness confers a personal benefit, while hard work involves foregoing this benefit, these utility functions are defined as 
follows: 14 


ul" = dZm — CZj = AZm — C 
ul = dZm 


Here, a is a scalar that measures the benefit of increased altruism z within one’s group, m. By contrast, c is a scalar that 
represents the personal cost to agent i of increasing her level of altruism (work ethic). Because we are treating altruism as 
a binary trait, we have z; = 1 for the hard worker i and z; = 0 for the shirker j. 

Will the work ethic spread or recede? At time t, let pf, represent the rate of change in the proportion of individuals who 
adhere to the altruistic work ethic norm, and pf represent the rate of change in the proportion of those who are egoistic, 


12 This example is inspired by Buchanan (1994, 5-31), who argues that the cultural norm of “work ethic” is altruistic in a way that maps onto the 
technical sense in which altruism is defined here. 

13 This is a form of social learning that allows modeling in terms of a “replicator dynamic.” For an explanation and derivation of the replicator system 
below, see Gintis (2009, 270-3). 

14 It’s no coincidence that the form of the utility function in this simple example resembles that employed in standard models of public goods. See, for 
example, Samuelson (1954). 
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lazy, free-riders. Modeling the problem in terms of simple replicator dynamics yields the following equations: 
Ba = Pa (Ua — Ù) 
De = Pe (ug — Ù) 
where ug is the average payoff to hard workers, ue is the average payoff to shirkers, and pt, denotes the fraction of the total 


population adhering to strategy m=a,e at time t. 
When will work ethic spread, i.e., when will we have pf > 0? Only when uf, > a > u$. At the initial state, t = 0, we have 


Ua = .8[a(.8) — c] + .2[a(.2) — c] = .68a — c 
Ue = .2[a(.8)] + .8[a(.2)] = .32a 
Thus, in this initial state, the work ethic norm will be drawing more adherents if and only if... 


.68a — c > .32a 
36s 
a 


In other words, if the benefits of living in a group of hard workers outweighs the personal cost of adhering to a demanding 
work ethic, then work ethic will tend to spread.!° 

The basic reasoning governing this simple example can be represented analytically by another version of the Price equa- 
tion: !° 


wAZ = Cov(z;, wj) +E[w,Az;| - 


WAZ = Cov(z;, wj) + E[Cov(wij, zij) + E[wij Az; || (MLPE) 


The addition of the subscript j in Eq. (3) indicates that we are considering group quantities: z; is the average level 
of trait z within the group and wj is the average fitness of the group. The addition of the subscript i in the next line, 
(MLPE) indicates that we are considering within-group covariances and expectations, taken over individuals within a fixed 
group j. As above, the term E[w;jAz;;] expresses non-selective forces, such as transmission bias or copying error. To focus 
on evolutionary forces we will ignore this term by setting it equal to 0. 

The Price equation represents the proliferation of a particular trait in a population, but it’s important to bear in mind 
that the fitness of a particular trait will depend upon which other traits are prevalent or absent in the population. For 
instance, the deadliness of a cobra’s venom enhances fitness only to the extent that the cobra also has sharp teeth capable 
of piercing animal flesh. The same idea applies, perhaps even more strongly, to cultural traits. The most important example 
in this regard is that of punishment. Consider a norm of respecting property rights, z. In the absence of punishment, an 
opportunistic free rider in group j may be better off disregarding this norm: E[Cov(w;j, zij)] < 0. If, on the other hand, there 
is a commonly accepted norm of punishing those who violate property rights, then accepting the norm of property may 
prove advantageous: E[Cov(w;j, zij)] > 0. In general, punishment can transform an otherwise altruistic trait into one that 
agents must adhere to for the sake of their own success. Insofar as punishment is costly, however, the norm of punishment 
itself is altruistic. The crucial issue of punishment will be discussed further in Section 3.2. 

The next subsection presents some important implications of the cultural, multilevel version of the Price equation. In 
particular, two equations derived from (MLPE) provide a clean representation of the conditions under which group beneficial 
rules are likely to evolve. 


2.3. Implications of the price equation 


One more mathematical fact is required to complete the derivation. If we denote as 6, the regression coefficient for wj 
on zj and fp the regression coefficient for w;; on Zig then we have: 


Cov(z;, wj) = B,Var(z;) 
Cov (Zi;, Wij) = B2Var (Z;;). 
Letting E[w;; Az;;| = 0 for reasons mentioned above and substituting these two equations into (MLPE) produces the fol- 
lowing: 


WAZ = B,Var(z;) + B2Var (zij) (4) 


15 More generally, if P) is the fraction of altruists in group j at time t and p, is the fraction of altruists in group k at time t, then altruism will have a 
t 2 ‘a es 
higher payoff whenever £ < ica H, 
J k 
16 See Appendix A.2 for a derivation of (MLPE). 


17 See Appendix A.3 for a more thorough explanation. 
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Because they are regression coefficients, 6; gives a measure of how changing the average level of z within a group affects 
the average fitness of the group, while 62 measures how changing the level of z of an individual within a group will change 
that individual's fitness. The expression (4) thus represents the evolution of the average level of trait z as a composition of 
a between group portion and a within-group portion. From (4), we can infer that the trait z will spread when... 


Var(Z;) "i -2 (*) 
Var(zij) Ba 
Assuming that the trait z is “altruistic” in the technical sense of benefiting the group, but not an individual within a group, 
then 6; > 0 and fy < 0. 

There are several crucial implications of the expression (*). The left-hand side states that a prosocial trait z is more likely 
to spread when variance within groups is minimized and between-group variance is maximized. The right-hand side tells 
us that the strength of selection pressures is also crucial. If the trait z is extremely harmful to an individual within a group, 
i.e. |B2| is large, then it is unlikely to evolve. If it is extremely beneficial for the group, i.e. 6, is large, then it is more likely 
to evolve. 

This framework allows for a rigorous analysis of the sorts of design features that will enable group-beneficial adaptations, 
even when such adaptations run against lower-level selective pressures. 

The Price equation also suggests another useful formula that can aid in understanding what conditions must be met for 
a group beneficial rule to proliferate.'® As a first step, consider breaking up an individual's fitness, w;, into two components, 
one determined by the rule itself and the second determined by the amount of rule adherence within the individual’s group 
or, more precisely,network of interaction. Each of these components will make some separate contribution, but they are not 
statistically independent. Having a high level of trait z may, for example, predict that one’s network of interaction is more 
likely to exhibit high average levels of z. This is not only because an individual will directly contribute to the average level 
of z within his or her group, but also because individuals with high levels of z may preferentially interact with others 
who exhibit a high (or low) level of z. To isolate the effects of individual z levels from those of group z levels, we must 
therefore write out w; as a sum of partial regression coefficients (Allen, 1997). To denote the fitness effect of increasing an 
individual’s level of rule adherence, z;, while holding group adherence constant, we write the partial regression coefficient 
Pwizizj Similarly, to denote the fitness effect of increasing group adherence, zj, while holding individual adherence constant, 
we write the partial regression coefficient Pwizjzi Putting these together in a regression equation yields... 


wi = Bo + Bwiz.2)Xi + PwzjziZj + € (5) 


where bo represents base fitness and € is an error term. Substituting (12) into the the Price equation produces the following 
equation: "° 


wAz= Bw,z-2;Var (zi) te Bwiz;z bza Var (z) 
= (wziz; + Bzz Bwiz;-2,)Var (Zi) 0) 


Since Var(z;) > 0 as a mathematical fact, this implies that a trait will be selected for, i.e. wAZ > 0, only if we have: 


Bwizz ag Bzjz Bw,z;-z; >0 (*) 


If we assume we are talking about an altruistic trait, then we know the following facts: 


Bwizizi <0 
Bwj2z;-2; >0 


Given these two facts, our prediction of whether an altruistic trait will spread depends upon a crucial feature of the 
population-interaction structure. In order to ensure that condition (««) is satisfied, we would like Bz jz; to be large and 
positive. In other words, going back to the intuitive meaning of the expression, an altruistic trait is more likely to be selected 
when altruists are capable of bunching together. This point is of fundamental importance for understanding multilevel selec- 
tion and the evolution of altruism, so it bears repeating: in order for an altruistic trait to evolve (through selection), altruists 
must have some mechanism(s) for excluding egoists from their network of interaction or, equivalently, of converting egoists 
within their network into altruists.?° 
Summing up the implications of (*) and (**), we can identify four functionlaity desiderata:?! 


1. Prosocial traits are more likely to emerge when within-group variance Var(z;;) is minimized, perhaps due to the ability 
of altruists to group together and to exclude or convert egoists. 


18 This second formula is inspired by Henrich (2004), but my formulation differs slightly. See Appendix A.4 for the derivation. 

19 See Appendix A.4 for a thorough explanation and derivation of this equation. 

20 Although, as Okasha (2006, 194) points out, this claim is true with respect to “strong altruism,” but not to “weak altruism.” 

21 Because these terms are important only relative to one another, these four desiderata could be reduced to two desiderata - or even to a single 
desideratum if we wanted to be fully parsimonious. I separate them here for analytic clarity, but we must remember that each of the desiderata must be 
appended with a ceteris paribus clause. | thank Vlad Tarko for pointing this out to me. 
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2. Prosocial traits are more likely to emerge when within group selective pressures against the trait (|2|) are small. Proso- 
cial traits that do not require extreme sacrifice are thus more likely to proliferate, but this might also involve setting up 
institutional mechanisms to punish egoists and to reward altruists. 

3. Prosocial traits are more likely to emerge when between group selective pressures for altruism (£4) are large. In times of 
frequent and intense interaction, especially, some believe, when resources are scarce and interactions are agonistic, high 
levels of cooperation at the social level are imperative. 

4. Finally, prosocial traits are more likely to emerge when variance between groups Var(z;) is large. 


Returning to Buchanan’s discussion of group-beneficial social norms, these four conditions offer a way of assuaging the 
fear that the “free-rider logic would seem to apply” to norms such as the work ethic (Buchanan, 1994, 81). Buchanan rea- 
sonably fears that individuals will under-invest in encouraging beneficial social norms, since these involve positive external- 
ities.2?, However, when groups of hard-workers are able to bunch together (condition 1), forcing flower children and beach 
bums to also bunch together (condition 4), and when the group benefits (“positive externalities”) of the work ethic are large 
(condition 3), while the individual cost is low (condition 2), then the Price equation implies that a strong work ethic will 
spread throughout the population. 

In the next section, | show how these four conditions provide the key to understanding the surprising successes of 
polycentric governance arrangements. In short, the features of polycentric political organization help to fulfill each of the 
four functionality desiderata. This discussion will also address the crucial issue of monitoring and enforcement of group 
rules. 


2.4. Group fitness and group welfare 


Before applying this formal framework to the analysis of polycentric governance structures, a conceptual issue requires 
clarification. The Price equation tells us what will make a rule adaptive at the global level. Rules that raise the average 
fitness of individuals within a group will spread to individuals in other groups via imitation, immigration, conflict, or some 
combination of forces. The question at issue, however, is why polycentric competition often provides good governance, i.e., 
why it often promotes human welfare. But is there any reason to suppose that group fitness corresponds to group welfare? 

Many have argued forcefully against this supposition. James Buchanan, for instance, has accused F.A. Hayek of adhering 
to the Panglossian fantasy that whatever evolves must be desirable. “My basic criticism of FA. Hayek’s profound interpre- 
tation of modern history and his diagnoses for improvement is directed at his apparent belief or faith that social evolution 
will, in fact, insure the survival of efficient institutional forms” (Buchanan, 1975, 211). Buchanan holds that Hayek's posi- 
tion amounts to normative evolutionism, that is, that we should passively accept the outcome of any evolutionary process 
(Buchanan, 2001, 312). In a similar vein, Dan Dennett has criticized a host of normative evolutionists for failing to explain 
why evolutionary outcomes should correspond to consciously chosen human values (Dennett, 1995, 468). Would survival 
of one’s culture “justify mass murder, for instance, or betraying all your friends?” Dennett asks. Clearly not. So, why, then, 
should we associate evolutionary success with normative desirability??? 

Certainly, there are ways in which norms with greater relative fitness can spread at the expense of human welfare, e.g. 
norms that promote violent conquest and ideological indoctrination of other groups. Such norms may be good at spreading 
themselves despite the fact that they are unpleasant for everyone, including members of the group in which they prevail. 
Just as obviously, however, a rule can attain greater relative fitness in ways that enhance group welfare, e.g. through pro- 
viding cultural or economic advantages. To determine which traits will have a relative advantage requires specifying the 
mechanism of selection. If selection occurs through the ability to excel at violence, or to impose other costs on competing 
groups, then it is quite likely that the fitness of a rule will not correspond to its ability to enhance welfare.“ If, on the other 
hand, selection occurs through the ability to induce others to adopt one’s own rules due to their intrinsic appeal or their 
apparent ability to promote the well-being of the group that adheres to them, then it is quite likely that the most fit rules 
will also be those that promote human welfare. 

In the context of cultural evolution within a polycentric framework, there are at least three reasons to think that group 
fitness does, in fact, correspond to group welfare. These reasons allow us to discount the worries put forth by Buchanan, 
Dennett, and others. 

First, although these worries are quite troubling for the case of genetic evolution, a major criterion of selection in cultural 
evolution is the appeal of certain rules or norms. In the case of genetic evolution, new adaptations are introduced randomly, 
and their ability to proliferate is determined purely in terms of their ability to promote relatively more offspring than 
alternative genetic sequences. By contrast, as explained above (sec. 2.2), cultural fitness is determined, at least partly, by the 


22 Buchanan (1994, 70) uses the phrase “paying the preacher” to refer to any investment in spreading and encouraging work ethic. 

23 One response, put forth by Hayek on various occasions, is that our deepest values are themselves products of cultural and biological evolution. For 
instance, Hayek writes that “value ... can only be understood as the determinant of what people must do to maintain the overall structure” (Hayek, 1983, 
36). This response raises a host of difficulties and complexities, which, if left unaddressed, render it unconvincing. Although | believe there is some merit 
to this response, laying it out in sufficient detail would lead us far afield. 

24 Bowles and Gintis (2011, ch.8, 197) emphasize the effect of violent conflict on cultural selection, while Mesoudi (2011) emphasizes the ability of rules 
to benefit their adherents as a source of fitness. 
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appeal of a particular rule of conduct.2? As Ostrom (1999, 57-8) points out, institutional evolution is not entirely blind, but 
is partly directed by the goal of improving human welfare. Institutional changes often involve copying rules that individuals 
know will improve their well-being. Alternatively, in the absence of such detailed knowledge, individuals may simply copy 
the rules of societies that appear to be highly successful, even if they do not entirely understand why (Henrich, 2015).2° 

A second, related reason to think that group fitness will correspond to group welfare in the case of cultural evolution 
concerns the variety of positive sum ways in a rule can exhibit a high degree of cultural fitness. In the case of both cultural 
and genetic evolution, one way of increasing the relative fitness of a unit of selection is by reducing the number of copies 
produced by other units of selection. In the case of genetic selection, this generally involves imposing a welfare cost on 
other organisms, since shortening their lives or decreasing their resources is the simplest way to limit their reproductive 
success. In the case of cultural selection, by contrast, decreasing the fitness of another norm need not involve harming the 
welfare of the organisms adhering to it. One can make a norm less appealing simply by providing better alternatives, or 
perhaps by “reframing” the norm is a way that supports a more negative attitude towards that norm (Bicchieri, 2016, 121-2, 
126, 139). 

Although it is less likely in the case of cultural evolution than in the case of genetic evolution, sometimes the best way 
to make a norm unappealing is by imposing costs on those individuals or groups who hold that norm. Here the specific case 
under consideration, viz. cultural evolution within a polycentric framework, offers a third reason for connecting group fitness 
with group welfare. As elaborated below, polycentric orders are governed by an overarching set of rules (Tarko, 2021).?’ 
Such rules generally aim to reduce non-productive, zero-sum competition between groups, while, at the same time, pro- 
moting rivalry that leads to useful institutional experimentation. To be “fit” within such a framework, a rule must offer 
apparent benefits, since the overarching set of rules limits the margins along which it can actively reduce the welfare of 
other individuals within the system. In this way, the overarching set of rules offers a framework for cultural evolution that 
supports a correlation between evolutionary fitness and human welfare. In some cases, the overarching set of rules will be 
dysfunctional. Due to this possibility, polycentricity alone does not suffice to ensure beneficial outcomes. Nevertheless, as 
argued in the next section, it does exhibit several desirable properties, which greatly increase the likelihood of beneficial 
evolutionary outcomes. 


3. Polycentricity and multilevel selection 
3.1. Defining polycentricity 


Vincent Ostrom offers a concise and now classic definition of polycentricity: 


..a polycentric political system [is] composed of: (1) many autonomous units formally independent of one another, 
(2) choosing to act in ways that take account of others, (3) through processes of cooperation, competition, conflict, 
and conflict resolution. (V. Ostrom 1991, 225) 


Other scholars have built upon this definition to provide greater precision. Especially notable are two teams of scholars. 
First, Aligica and Tarko (2012, 257) present a “concept design” that involves three key features of polycentricity, as well as a 
host of empirical indicators for each of these features. The three features are: 


1. A multiplicity of decision centers 
2. An overarching system of rules 
3. A process of evolutionary competition between the decision centers. 


A second team of scholars—Stephan, Marshall, and McGinnis (2019)—provide a list of eight features of polycentric sys- 
tems. However, they consider four of these features to be of special importance: 


1. Multiple decision centers 

2. Autonomous decision-making authority for each decision center 

3. Overlapping jurisdictions of authority between the decision centers 
4. Various processes of mutual adjustment among decision centers (41). 


While several others have also provided definitions of polycentricity, they all more or less resemble the three definitions 
covered here.?® 

Pulling together the various features of these three different definitions, we can identify a basic schema for the organi- 
zation of polycentric governance: 


25 ‘Rule of conduct’ is used loosely here. The same reasoning applies to any other sort of cultural replicator. 

26 Ostrom and Henrich also point out that cultural evolution takes place on much faster time scale than does biological evolution, largely due to the fact 
that it incorporates deliberate choice and intentional modifications, rather than random, blind variation. I thank an anonymous referee for suggesting the 
importance of this point. 

27 Absent an overarching set of rules, the system is not technically polycentric, but fragmented and anarchic (Tarko, 2016, 43). 

28 For alternative, though similar, definitions, see Ostrom et al. (1961), Toonen (1983), Folke et al. (2005). Aligica and Boettke (2009), Garmestani and 
Benson (2013). 
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Polycentric Political Structure: A polycentric political structure consists of rule-governed collectives with well-defined, 
and often overlapping jurisdictions that interact in a rule-governed, competitive manner resulting in the relative ex- 
pansion or contraction of their jurisdictions. 


To reduce this definition to an orderly list, we might identify three features: 


1. Multiple decision-making units 
2. Each decision-making unit has authority over a specified, but evolving jurisdiction 
3. These decision-making units compete and cooperate in various ways, according to a set of overarching rules. 


This precise understanding of polycentricity enables an assessment of polycentric functionality in terms of the conditions 
for group functionality laid out in Section 2.3, an assessment to which we now turn. 


3.2. Facilitating multilevel selection 


The key to understanding how polycentric political organization enables beneficial outcomes is to appreciate how the fea- 
tures of polycentricity, as defined above, coincide with the conditions for group-beneficial outcomes derived in Section 2.3. 
To begin, recall the first two functionality desiderata, both of which concern the within-group components of evolutionary 
pressures: 


1. Prosocial traits are more likely to emerge when within-group variance Var(z;;) is minimized. 
2. Prosocial traits are more likely to emerge when within group selective pressures against the trait (|62|) are small. 


Polycentric organization facilitates the satisfaction of (1) by allowing like-minded individuals to coalesce into groups cen- 
tered around shared concerns. The importance of self-sorting for the maintenance of cooperation has long been recognized 
by social scientists. As Axelrod (1986, 1105) writes in his classic article on the evolution of norms, an important “mecha- 
nism for the support of norms is voluntary membership in a group working together for a common end.” This is also a 
common point raised in favor of federalist political constitutions, a form of polycentric political organization.”° In a feder- 
alist structure, “[mJobile individuals can join that city-state having their most preferred set of rights and responsibilities” 
(Inman and Rubinfeld, 1996). More recently, the fact that polycentric orders provide a framework for voluntary coalitions 
has been emphasized by Aligica (2018), who argues that polycentric political structures enable the spontaneous formation 
of groups in which individuals coalesce around a defined “problem solving” context Aligica (2018, 104). Having agreed on 
the existence and nature of a pressing problem, citizens are more likely to coordinate on shared rules that seek to address 
it. These features of a problem solving context provide reason to believe that individuals within the group are more likely 
to adhere to the prevailing social rules. Within-group variance, in other words, is minimized by the spontaneous formation 
of like-minded groups. And a polycentric political framework facilitates this formation. 

Axelrod and Aligica also believe that voluntary group formation supports effective monitoring and punishment, the pres- 
ence of which directly addresses functionality desideratum (2). 


The power of membership works in three ways. First, it directly affects the individual’s utility function, making a 
defection less attractive because to defect against a voluntarily accepted commitment would tend to lower one’s self- 
esteem. Second, group membership allows like-minded people to interact with each other, and this self-selection 
tends to make it much easier for the members to enforce the norm implicit in the agreement to form or join a group. 
Finally, the very agreement to form a group helps define what is expected of the participants, thereby clarifying when 
a defection occurs and when a punishment is called for (Axelrod, 1986, 1105-6). 


Having formed a like-minded group, with general agreement on priorities, citizens are more likely to accept compro- 
mises, do their fair share, and censure violators Aligica (2018, 104). Importantly, violators themselves are more likely to 
respond positively to censure, since they accept the basis of social rules and recognize their importance for solving a rele- 
vant problem.?° 

There is general agreement among evolutionary theorists that effective monitoring and punishment can stabilize co- 
operative behavior. It does so by removing the advantages of rule-breaking, thus minimizing |2|. However, there is also 
widespread agreement that punishment itself is costly: 


It might be argued that individuals cooperate in order to avoid punishment by other members of their own group. 
This notion seems plausible based on common experience. However, it does not solve the theoretical problem; it 
only raises the new problem of why individuals should cooperate to punish other individuals. Punishment itself is an 
investment in the production of some other public good, for example, civil order. Each potential punisher can have 


29 Jan Vogler has convinced me that not all federalist systems are truly polycentric. For instance, there may not be cooperation and competition between 
political units on the same level. That said, most federalist systems do fall into the category of polycentric organizations. 

30 Whether or not the explanation offered by Axelrod and Aligica is found to be convincing, there is ample empirical evidence that cooperation in- 
creases when individuals are allowed to enter or exit groups facing social dilemmas. See, for example, Orbell and Dawes (1993), Orbell et al. (1984), 
Schuessler (1989), and Yamagishi and Hayashi (1996). Whatever the explanation, self-sorting appears to be a powerful mechanism for stabilizing coopera- 
tion. 
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only a small incremental effect on the level of civil order, and again, the cost to the individual participating in the 
punishment of others could be substantial. The rational selfish individual would let the other person do the punishing 
(Boyd and Richerson, 1982, 238)?! 


While the first-order problem of cooperation may be resolved by punishment norms, we still face a “second-order social 
dilemma (of equal or greater difficulty)” (Ostrom, 1998, 7). This is no trivial matter, since it concerns the evolutionary 
stability of group-beneficial rules (Dawkins, 2016).*2 To complete the account of group functionality, then, we must consider 
whether polycentric governance structures have the capacity to support effective monitoring and punishment. 

A wide range of recent work on the evolution of punishment supports the idea that punishment norms, despite their 
altruistic character, can become widespread and stable within properly structured populations. For our purposes, there are 
two key ideas in this literature. The first is that punishment exhibits decreasing costs as it becomes more widespread. As 
Bowles and Gintis (2011, 149) put it, “punishment is characterized by increasing returns to scale, so the total cost of pun- 
ishing a particular target declines as the number of punishers increases.” In fact, to the extent that punishment is effective 
at suppressing norm violations, “... the payoff disadvantage of punishers relative to contributors approaches zero as defec- 
tors become rare because there is no need for punishment” (Boyd et al., 2003, 3531). In other words, once punishment is 
widespread and effective within a group, there is little to no fitness differential between punishers and non-punishers. So, 
if the punishment norm is the trait in question, then | | will be extremely low, permitting group selection to overpower 
individual-level selection in the spread of punishment norms (Boyd et al., 2003, 3534). 

The idea of decreasing punishment costs does not, however, explain how punishment might arise in the first place. To 
explain the emergence, rather than the mere stability, of punishment norms requires the second key idea: self-sorting. In 
general, cooperative behaviors are far more stable when cooperators are able to identify themselves and group together 
(Eshel and Cavalli-Sforza, 1982). In the case of punishment, the increasing returns to scale noted by Bowles and Gintis 
(2011, 149), make self-sorting all the more effective.” Boyd et al. (2010) develop a model in which individuals can signal 
to one another that they are committed to the same norms and are willing to enforce them by punishing defectors. This 
ability to form coalitions with like-minded individuals reduces the risk that one will suffer major harms by attempting 
to unilaterally punish defectors. Self-sorting thus supports the spread of punishment norms by allowing for coordinated 
punishment efforts. In this scenario, therefore, punishment norms can proliferate even when they start off at extremely low 
levels. Moreover, the logic of this model provides theoretical intuition for the empirical finding that individuals are more 
willing to engage in costly punishment when they are able to communicate and thereby coordinate their punishing behavior, 
a phenomenon observed in the laboratory (Ostrom et al., 1992, 405), as well as in the field (Ostrom, 2014, 244). 

The many articles cited above suggest that self-sorting will support effective punishment and thereby reduce |62|. The 
key lesson, for our purposes, is that punishment norms can proliferate and stabilize cooperation when willing punishers are 
able to sort themselves into groups. When the network of interaction supports non-random encounters, so that cooperators 
and willing punishers are more likely to interact, then punishment can become common within the group and functionality 
desideratum (2) will be satisfied. As argued above, the capacity to self-sort into a community with shared priorities or interests 
is a key design feature of polycentric organizations. For this reason, polycentric governance supports effective punishment 
norms and hence the satisfaction of functionality desideratum (2). 

The second two functionality desiderata concern the group-level forces: 


3. Prosocial rules are more likely to emerge when between group selective pressures (£4) are large. 
4. Prosocial rules are more likely to emerge when variance between groups Var(z;) is large. 


The third feature in our definition of a polycentric political structure - that decision-making units interact and compete 
- favors the increase of 64. When jurisdictions with diverse rule sets are in a state of constant interaction, members of 
other jurisdictions become familiar with the alternative rules and with their effects with respect to well-being. This, in turn, 
heightens inter-unit competition. If the unit’s jurisdiction is geographical, they are more likely to engage in Tiebout competi- 
tion by enticing individuals to “vote with their feet” (Tiebout, 1956). If the jurisdiction is non-geographical, individuals may 
simply switch to the governance provider who yields better results at lower costs. In this way, interaction between juris- 
dictions intensifies group-level selective pressures by increasing competition in the standard way familiar to all economists. 
Rules that fail to entice adherents thus face more rapid decline than they would under autarkic conditions with lower levels 
of competition. 

The first and second features in our definition of a polycentric political structure - multiple decision-making units with 
authority over a specified jurisdiction - favor prosociality by increasing Var(z;). Boundaries between groups are crucial for 
developing distinctive sets of rules; institutional diversity presupposes distinct jurisdictions.*+ Without well-defined groups, 
in which individuals share the same rules, the variance between collectives will either be less pronounced or undefined, 


31 See also Ostrom (1998, 7). 

32 For the original arguments, see Williams (1966). 

33 The increasing returns dynamic is also discussed by Boyd et al. (2003, 3531), Henrich (2004, 26-7), Henrich and Boyd (2001, 81), and Boyd et al. (2010, 
617-8). 

34 Its important to bear in mind that jurisdictions may or may not be territorial. Religion, for example, has been analyzed as a polycentric order and its 
jurisdictions are defined by its members, not by a geographical area (Gill, 2020). 
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since the collectives themselves will be undefined. The capacity for polycentric systems to support institutional diversity 
has been elaborated by Aligica (2014): “a polycentric system is the embodiment of institutional pluralism....The pluralism of 
institutional forms ensures a variety of responses to a variety of circumstances....[0]ne can hardly think of a better arena for 
experimentation than polycentricity” (66). As Aligica points out, polycentric organization offers an institutional framework 
in which groups or coalitions can experiment with a wide array of different rules to respond to a wide variety of different 
concerns. The result is that different groups within society will form around different sets of rules, thus increasing the 
diversity between groups and satisfying functionality desideratum (4). 

In sum, polycentric governance structures directly address all four functionality desiderata. The aim has not been to show 
that polycentricty will guarantee that any of these desiderata will be satisfied. Instead, the aim has been to demonstrate that 
polycentric structures support these desiderata, thus increasing the likelihood that the social system will satisfy them. When 
these desiderata are all satisfied, group-level selection will overpower individual-level selection. In such a case, the outcome 
will be higher levels of social functionality. The Price equation framework thus offers a theoretical tool for understanding 
how polycentric governance structures can be effective even though they lack a centrally-orchestrated plan for achieving 
beneficial outcomes. 


3.3. Polycentricity in action 


The discussion thus far has been highly abstract. To clarify the theory expounded here, it is worth considering a concrete 
example of polycentric governance in action. A classic example of polycentric governance is the set of institutions, com- 
prising formal and informal rules, that regulates the scientific community.*° Polanyi (1951), who originally articulated the 
concept of polycentricity, presents science, as practiced in countries like the United Kingdom, as the paradigmatic example 
of polycentric organization. More recent work has confirmed and deepened Polanyi’s analysis (Tarko, 2015). This subsec- 
tion will show how science fits the definition of polycentricity and demonstrate that this allows it to fulfill functionality 
desiderata (1)-(4). 

To understand the polycentric organization of science, it helps to contrast it with an example of non-polycentric science. 
At the time when Polanyi was writing on polycentricity, the Soviet Union was engaged in a large-scale planning experiment. 
Their monocentric governance approach extended to science, as well, where the Soviet government laid down methods, 
rules, and doctrines for scientific researchers. This was especially true in the area of research on genetics, where Trofim 
Denisovich Lysenko, director of the USSR’s Institute of Genetics, persecuted scientists who pursued research in areas that 
he deemed to be pseudo-scientific or corrupted by bourgeois ideas. Most notably, Lysenko rejected Mendelian genetics, and, 
consequently, outlawed research in this area. Polanyi (1951, 107) cites Lysenko’s ordered execution of Nikolai Vavilov, but 
there are several other notable examples, such as the geneticist Nikolai Balyaev.°° In short, scientists who did not adhere to 
the official state views on science were punished severely by officials of the centralized state. 

This monocentric organizational scheme stands in stark contrast to the polycentric scheme that prevails in free nations 
around the world. Scientific progress occurs as parallel research teams - often employing different methods and accepting 
different theoretical premises - compete for publications, citations, and awards. Though diverse in many ways, these dif- 
ferent teams generally accept a thin set of professional norms, including a commitment to seeking truth and to publicizing 
their findings and evidence (Smolin, 2006, 301). As Tarko (2015) explains, these features of the scientific community suffice 
to make it a form of polycentric organization. The decentralized teams constitute a set of decision-making units, satisfy- 
ing the first feature of polycentricity as laid out in Section 3.1. In addition, decision-making units have broad autonomy in 
determining the kind of research they will carry out, along with the kind of research they will valorize by citing or build- 
ing upon. This realm of authority, sometimes physically housed in formal institutions, such as universities or government- 
funded research centers, constitutes a specified jurisdiction, thus satisfying the second feature of polycentric organization. 
Finally, these research teams compete in various ways: to attract funding, to publish papers, to garner citations, and to win 
awards. They also cooperate by sharing data, further developing one anothers’ theories, or offering constructive criticism at 
professional conferences. These various ways of interacting conform to a set of overarching rules, a general “shared ethic” 
(Smolin, 2006, 301), that allows them to improve the state of science through their interactions. The organization of the 
scientific community thus satisfies the third feature of polycentric organization, as well. 

The theory developed here concerns the ability of a polycentric organizations to achieve successful outcomes, and in the 
case of science, this success is spectacular: 


The scientific community is arguably one of the most successful human organizations ever created, both with respect 
to its declared main purpose (truth-seeking) and with respect to secondary goals such as obtaining large government 
subsidies (while maintaining independence and freedom from interference) and obtaining preferential treatment in 
public schools or in courts of law (despite often being highly disruptive to common belief systems) (Tarko, 2015, 64) 


35 Other real-world examples include certain metropolitan governance systems (Ostrom et al., 1961), political federations, such as the European Union 
(Vogler, 2020), and decentralized, competitive resource management schemes, such as the CAMPFIRE project in Zimbabwe (Schmidtz, 1997). 

36 Dmitri Balyaev, the brother of Nikolai, went on to discover several genetic principles behind domestication, but was forced to do so in secret, due to 
his fear of upsetting party officials (Wrangham, 2019, 67ff.). 
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What I have called “the puzzle of polycentric functionality” emerges from an appreciation of what can go wrong when 
individuals engage in unplanned, decentralized activities. As Polanyi (1951, 108) puts it, “Suppose we started building a 
house without any plans, each workman adding his part according to his own ideas, using whatever materials he preferred, 
putting in bricks or timber, lead pipes or floorboards as he thought fit. Surely the result would be a hopeless confusion.” 
In response to this puzzle, Polanyi (1951, 109) asserts that the “nature of scientific systems is more akin to the ordered 
arrangement of living cells which constitute a polycellular organism” than it is to the construction of a home. In his own 
intuitive way, Polanyi has therefore suggested that the success of science can be understood in terms of multilevel selection. 
The transition from unicellular to multicellular organisms is accounted for by increasing pressures at the cellular-group level 
and decreasing pressures at the individual cellular level (Smith and Szathmary, 1997). Polanyi’s analogy therefore evokes 
exactly the type of explanation provided here, although his statement anticipates by twenty years the formal analysis of 
multilevel selection in the Price equation framework. 

Following the lead of Polanyi, we can use the framework developed above to explain how the polycentric organization of 
science allows researchers to operate like cells in a body. The polycentric nature of science means that there is no monopoly 
provider of research; researchers can self-sort into teams of like-minded researchers, intent on employing similar methods 
to pursue knowledge of similar topics. Desideratum (1), which requires reducing within-group variance, is naturally satisfied 
in this way. 

Satisfying desideratum (2), which requires the minimization of within-group selective pressures, is not terribly difficult 
in a context where the team members share a common fate. Individuals who refuse to accept the norms (i.e. the premises 
and methods) of their research team will likely disrupt the process of research and publication, hurting themselves in the 
process. This is, therefore, an example of “inclusive fitness,” which theorists have shown to be conducive to promoting 
altruistic traits (Hamilton, 1964).?” There will likely be cases where the joint nature of success is still insufficient to suppress 
rogue scientists who threaten to hurt the team’s progress or lazy scientists who free ride on the efforts of others. Although 
far less severe than Soviet-style punishment, research teams possess their own battery of punitive measures for researchers 
who reject the prevalent norms of their institution. These may include refusing to include a scientist on publications and, in 
the academic context, refusing to grant tenure. In addition, such scientists may simply be ignored: “[t]he ultimate fate of the 
entrant who disagrees with the orthodoxy but cannot persuade the community to accept his point of view is, quite simply, 
isolation within or banishment from the community” (Kendall, 1960, 979). Even highly prestigious individuals who refuse to 
accept the norms of their research team or their broader community wind up languishing in isolation. As Tarko (2015, 69) 
explains, this was the fate of Albert Einstein when he rejected the prevailing interpretation of quantum mechanics. 

Desideratum (3) concerns the strength of between-group selective pressures. In the scientific community, these pressures 
are exerted in a decentralized fashion. They include social prestige, funding, academic titles, and awards. They are powerful 
motivators, since they determine the career success and reputation of those who devote their lives to research. Important, 
in this respect, is that research teams typically sink or swim together, and teams regularly view themselves as competing 
with other teams. One of the most well-known examples of a highly competitive interaction between research teams is the 
race which unfolded throughout the 1950s to discover how the various parts of the DNA molecule fit together. Two leading 
teams emerged: one team, led by Watson and Crick, were in a conscious effort to outpace that led by Wilkins and Franklin. 
The result was a major scientific breakthrough. Successful teams, as the DNA example underlines, become highly influential 
in the field of science, and their “norms” - that is, their basic premises and methods - are copied by other research teams 
hoping to make their own contributions and acquire their own prestige. 

Finally, desideratum (4), which requires diversity between groups, may be the most prominent benefit of decentralized 
scientific organization. Although there are areas of relatively settled science, in which an overwhelming consensus prevails, 
on many issues disagreement is quite common. In such areas, pluralism is crucial, as it allows “as many trails as possible 
[to] be covered” (Polanyi, 1951, 110). As Tarko (2015, 71) explains, 


As long as there are grounds for reasonable people to disagree, the polycentric nature of the scientific community is 
crucial for its success because it is this polycentric organization that secures the diversity of opinions. It is not enough 
to rely on individual scientists being creative and able to “think outside of the box.” It is essential for them to have 
institutional environments where they can pursue their viewpoints. 


The polycentric organization of science provides such institutional environments, and in this way supports a greater 
diversity between research units than a monocentric system like that implemented by the Soviet Union. 

In sum, the polycentric organization of scientific inquiry supports the conditions required for promoting cooperative 
norms and suppressing opportunistic behavior. In this regard, it greatly outperforms more monocentric alternatives. The 
main thesis of this paper is that the success of polycentric organizations can be explained by their capacity to promote 
higher-level selection pressures and suppress lower-level ones. The search for truth undertaken by the highly polycentric 
scientific community demonstrates how this capacity operates in real world institutional settings. 


37 More precisely, from the replicator’s perspective, these traits cease to be altruistic, since harming others entails harming oneself. Norms that permit 
this will not proliferate. 
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4. Conclusion 


Drawing on the theory of multilevel selection, this paper has addressed the question of why polycentric political struc- 
tures yield successful outcomes. Polycentricity allows a system to satisfy four functionality desiderata derived from the 
multilevel Price equation. These desiderata represent the conditions under which group benefits exert more force relative 
to individual benefits. Polycentric organizations structure human interacitons so as to increase the likelihood of satisfying 
these desiderata. Through the prism of polycentricity, individual behaviors are synthesized into group-level adaptations. 

It is important to emphasize, however, that polycentric structures do not ensure group-level success. In particular, mech- 
anisms to ensure effective monitoring and enforcement may not develop, even though polycentric structures increase the 
likelihood of such mechanisms. From a theoretical perspective, the Price equation suggests that polycentric governance re- 
quires some way of reducing the within-group benefit of rule-violating behavior. Empirical case studies confirm this theo- 
retical insight: several of Elinor Ostrom’s design principles can be understood precisely in terms of the need to reduce the 
benefits of rule violation, and her case studies demonstrate how failure ensues when a self-governing community fails to es- 
tablish effective monitoring and punishment (Ostrom, 2016, Ch. 5). Group functionality can hardly emerge amidst ubiquitous 
defection and free-riding. As argued in Section 3.2, however, work on the evolution of punishment suggests that polycentric 
structures support, even if they don’t guarantee, the development of effective punishment norms. 

This paper has sought to create a conceptual bridge between Ostromian political economy and multilevel selection theory. 
Certain features of polycentric institutional structures - i.e. well-defined, competing jurisdictions with rule-making auton- 
omy - provide the requisite conditions for group-level adaptation to take place. In a polycentric structure, group-beneficial 
norms, including norms of monitoring and enforcement, are likely to spread. 

A crucial question, however, remains unanswered: how do we explain the emergence of polycentric governance struc- 
tures themselves? Addressing this question presents an opportunity to deepen our understanding of polycentricity along 
both historical and theoretical lines. Must polycentric institutions be designed and imposed or can they themselves evolve 
spontaneously? If so, under what conditions should we expect such institutions to evolve? Because polycentric institutions 
are themselves frameworks for group-beneficial adaptations, these questions are related to exciting work on the evolution 
of evolvability (Wagner and Altenberg, 1996). 
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Appendix A 


George Price wanted a simple, but extremely general equation to describe the change in the average level of some char- 
acter trait between “generations.” The word “generations” is in quotations, since Price’s equation describes even processes 
that take place without any genetic relatedness, indeed, without any genes whatsoever. The Price equation thus describes 
technological or cultural evolution just as well as genetic evolution. 


Al. Deriving price’s equation 


Al.1. Set-up 
e There is a population P consisting of n entities. 
e ‘P’ stands for ‘Parent’. 
e z: A (phenotypic) trait of some kind, which can be measured (discretely or continuously) with real numbers. 
+ zi: The level of z exhibited by the entity i e P = {1,..., n}. 
e Z: The average level of z in the P-population. 
+ Mathematically, Z = } 2, Zi 
+ zi: The amount of trait z transmitted by i to any of its “offspring.” 
e Azi: The change in the level of z from one entity to its offspring. 
© AZ =Z ži 
+ Z can be thought of as the copying fidelity or transmission bias of trait z for entity i. 


e Taking an average across all entities in P, the expectation of Az; is E[ Az] = ty AZ 
e wi: The average number of offspring produced by entity i. 

e w; can be thought of as the “fitness” of entity i. 

e Similarly to Z, we define the average fitness as w = DDA wi 

¢ Often we will be interested, not in the absolute fitness of an entity, but in its fitness relative to the other entities in 
Wi 


the group. For this comparative purpose, we define relative fitness of entity i as y; = {. 
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Fig. A.1. The Partition Theorem Illustrated by a Truly Great Artist. 


e There is another population of interest, that comprised of the offspring of all entities in P. Call this population O. 
e ‘O’ stands for ‘Offspring’. 

* Zo: The average level of the z trait in population O. 
e Mathematically, 


L= DV a gt e 


e Importantly, for the derivation, E[y] = 15 yi =1, which makes sense given that y; is a proportion of the total 
O-population. 


The Partition Theorem Equation (6), while it describes the O-population, is couched entirely in terms of P-population 
traits. This may seem somewhat mysterious, but it is the most crucial equation to understand for the derivation that 
follows. This equation is easy to grasp once one understands the so-called partition theorem.** The idea is actually 
quite intuitive. Suppose we divide the total population into a set of jointly exhaustive and mutually exclusive subpop- 
ulations. Then the average level of a trait, e.g. z, within the population as a whole will simply be the (weighted) sum 
of the averages of each subpopulation, where each one is weighted by its relative size. This idea can be illustrated 
with a simple diagram. 

In Fig. A.l, we have divided a population comprised of dots into four subpopulations, A; — A4. We might think 
of each of these subpopulations as the offspring of a single entity in the P-population, so that A; consists entirely 
of entity 1’s offspring, A2 of entity 2’s, and so on. Suppose the size of each dot represents the magnitude of our 
quantity of interest (here, the amount of trait z each entity in the O-population has). The partition theorem states 
that we should add up the average z level in each of the subpopulations A; — A4, weighting each of these by the 
proportion of the total entities that they contain. Here, A; has many entities, 4/10 = 40% of them, but these entities 
don’t possess very much of the z-trait. So, they will drag the average down much more than, say, Aj, which has a 
modest amount of z-trait and only comprises 1/10 = 10% of the total population. A3 will surely raise the average, but 
not by too much, since it only contains 2 entities, 20% of the population. A4, on the other hand, contains a sizeable 
30% of the total population, but the average amount of z-trait in A4 seems neither large nor small. Hence, adding A4 
to the calculation is unlikely to significantly raise or lower the running average. Letting A j denote the average level of 
z-trait in subpopulation j e {1, 2,3, 4}, the partition theorem tells us that the average (i.e. the expected value) of the 
whole population will be: 


4- 1- 2- 3- 
A, + A? + —A3 4 ; 
10 ie? 1 T0 4 
The reasoning applied here is exactly the reasoning that underlies the partition theorem, and if you understood this 
reasoning, then you are (at least) very close to understanding why Zo = DDS VizZ;- 


A1.2. Deriving Az 

Again, what we're after is a simple expression that describes the change in the average level of the z-trait. We will denote 
this quantity Az. Now, to begin this derivation, we simply note the fact that AZ must be equal to the difference between 
the average in the new O-population, and the old P-population. 


AZ=Z)—-Z 


38 See any probability textbook for an explanation. A particularly nice statement, couched in terms of expected values, rather than probabilities, can be 
found in Grimmett and Welsh (2014, 34). 


278 


A. Schaefer Journal of Economic Behavior and Organization 210 (2023) 265-287 


Recall, Z = AZ + Zi 
1< 1< 
Hence, Az= nà maara = n27 
l= = 


1g 1% 1% 
= -2 vz -23 f =- AZ 
i=l i=l iat 


1 Li on 
= nV Elv]; 2% - nD nAz 
= E[yz] - E[y ]JE[z] +E[y Az] = («) 


Covariance The covariance between two random variables, X and Y with mean values uy and uy, respectively, is 
defined 


Cov(X, Y) = E[(X = ux) (Y = uy )] = E[XY] — E[X]ELY]. 


Therefore, 
AZ = Cov (y, z) + E[y Az]. (Price Eq.) 


For some reason, presumably mathematical tractability, biologists typically prefer to multiply (Price Eq.) through by w, 
yielding an equivalent, but more familiar, statement of the Price equation: 


wAzZ = Cov(w, z) + E[wAz] (2) 


Notice that the only real change is that y has been replaced with w. This is just because y; = w;/w, and we multiplied 
through by w. The mathematical details here are both simple and unenlightening, so I will spare the reader. To obtain 2, 
simply multiply equation (*) by w and simplify.*° 


A2. Deriving the multilevel price equation 


The key idea of the multilevel version of the Price equation is to partition selective pressures into two categories: one 
between “collectives” and one between “particles,” i.e. the elements which make up the collectives. Building from Eq. (2), 
there are two ways of arriving at the multilevel from of the Price equation. The top-down approach relies on the neat 
recursive trick of inserting the Price equation into itself, but indexing the inserted version to a lower level of selection. The 
bottom-up approach also relies on a mathematical trick, that of decomposing covariance into a within-partition term and a 
between-partition term. 


A2.1. Top-Down 
Recall Eq. (2): 


wAz = Cov(w, z) + E[wAz]. 


The covariance and expectation terms of this equation are taken over entities i € P. In this top-down version of the deriva- 
tion, we will assume that each entity i is itself a collective. That is, each i e P is itself made up of adaptive particles. We 
might think of each i as a group made up of individual organisms, or as an organism made up of genes which are some- 
times capable of manipulating the meiotic process so as to increase their own spread, often at the expense of the progeny- 
organisms (meiotic drive). 

To formalize this new situation, we will need some additional notation. Let j € i be particles in group i. For ease of 
mathematics, and without loss of generality, we assume that all groups i are of the same size. In this context, we will 
slightly abuse notation to draw an important distinction: Cov;(w, z) will represent the covariance between group-fitness (i.e. 
the average number of offspring produced by the particles in i) and group-trait level (i.e. the average level of z possessed by 
the particles in i). Cov;(w,z), on the other hand, will represent the covariance between particle-fitness (i.e. the number of 
offspring produced by each j € i) and group-trait level (i.e. the level of z possessed by the j €e i) within some group i. Similarly, 
E;[wAz] will represent the expectation of the product of group-fitness (i.e. the average number of offspring produced by the 
particles in i multiplied by the change in group-trait level (i.e. the change in average level of z possessed by the particles 
in i between generations). E;[wAz], on the other hand, will represent the expectation of particle-fitness (i.e. the number of 
offspring produced by each j <i) multiplied by the change in the individuals’ levels of z-trait (i.e. the level of z possessed 
by the j <i). More briefly, when the expectation is indexed by i, the expectation sums across the groups i that partition P. 
When the expectation is indexed by j, the expectation sums across particles j that comprise some specified group i. And 
similarly for covariance. 


39 To see a derivation of (2) instead of (Price Eq.), see Okasha (2006, ch.1). 
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We will derive the following: 


Lemma. In the multilevel context, Eq. (2) is equivalent to the following: 
WAZ = Cov; (w, z) + E;[Cov;(w, z) + Ej[wAz]] (7) 


Proof. Starting with Eq. (2) and inserting our new notation... 
wAz = Cov(w, z) + E[wAz] 
= Cov;(w, z) + E;|[wAz] (i — level) 


That is, we apply the Price equation at the level of collectives, made up of lower level particles that are not yet repre- 
sented in the equation. We are moving in a “top-down” direction. Let m be the number of particles in each group and n be 
the number of groups. To represent the lower-level particles explicitly, we note that... 


Le 1S 


E,[wAz] 


IASI 1S 
=z 2 m Wim 2 A 


i=1 


1 n 
z 2 (;Až;) 
i=1 


= E[wjAz;] 
Here, we note that the expression within the expectation identical to the left-hand side of Eq. (2), except that it is indexed 


to the lower level of particles j, rather than the level of collectives i. Accordingly, we can pull the clever move | alluded to 
above, of recursively inserting the Price equation into itself: 


E,[wAz] = E;[wjAz;] 
=E,[Cov;(w,z)+Ej[wAz]]  (j—level) 
We now simply insert (j-level) into the expression (i-level) to yield our result: 
(7)WAZ = Cov;(w, z) + E;[Cov;(w, z) + Ej[wAz]] 


Before re-deriving (7) from the bottom up, consider its interpretation. This equation partitions the selective forces 
into two levels: the i-level of collectives, represented by Cov,;(w,z) and the j-level of particles, represented by 
E,[Cov; (w, Z) +E,[wAz]]. What would happen if we eliminated evolutionary pressures at the particle level by assuming 
that all particles “breed true” (z’ =z and hence E,[wAz] = 0) and that they all have the same fitness (so that z; is not 
correlated with w;)? We are left only with Cov;(w, z), which measures how collective-level fitness (the average fitness of 
a collective’s particles) corresponds to collective-level z-trait (the average z-trait level of a collective’s particles). In other 
words, we are left with selection at the level of collectives. Alternatively, suppose that there is no collective-level selection. 
That is, a higher average level of the z-trait within a group does not correspond to greater proliferation of its members rela- 
tive to other groups. Perhaps all collectives have the same fitness, or, for whatever other reason, w; is not correlated with z;. 
Then, of course, Cov(wj, zi) = 0 and we are left only with the term E;[ Cov (w, z) + Ej[wAz]], which takes the average across 
groups of the evolutionary outcomes that occur within groups at the particle level. In other words, the evolutionary process 
is entirely determined by selection and transmission bias at the particle-level. 

One more point, which will serve as a transition to bottom-up thinking, bears mentioning. The term 
E,[Cov; (w, Z) +E,[wAz]] has been described in two distinct ways: (1) as selection and transmission bias when seen from 
the particle-level, and (2) as the transmission bias when seen from the collective level. The Price equation thus reveals an 
interesting fact about multilevel selection: transmission bias at level i is an evolutionary process unto itself at level (i— 1). 
To take a concrete example, if we observe the preferential transmission of a particular allele in a human population, this can 
be understood either as a sort of transmission bias at the phenotypic level (z; > z; and so E[w;Az;] > 0), or we can consider 
it to be a process of meiotic drive, an evolutionary process unto itself, taking place at the level of competing genes. As we 
will see shortly, starting at either of these levels, we can build upwards to consider how lower-level evolutionary processes 
determine evolution at higher levels. Evolving systems, at least in the Price framework, can be seen as a vast series of nested 
and inter-determining levels of selection. 


A2.2. Bottom-Up 

Just as the top-down derivation employed a clever recursive trick, so this bottom-up derivation utilizes another tool in 
the arsenal of mathematical analysis: decomposing covariance into a within- and a between-group component. To demon- 
strate this decomposition | rely on graphics and intuition. A fully rigorous mathematical approach can be found in Wade 
(1985, 62-3). 
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T Population Dispersion from the Mean 
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Fig. A2. Population Dispersion of z-trait. 


Consider the covariance term we wish to decompose: Cov(w, z). Our decomposition involves separating an observed dis- 
persion into two components: one that is “explainable” by the group to which the particles belong and another explainable 
by the dispersion within the group. Suppose we have a total population P consisting of 9 particles, all of whom have vary- 
ing levels of z-trait. There are 9 total particles, j e {1....,9}, but considered as members of a group, each particle will be 
indexed with an i € {1, 2, 3}. We thus have three groups, k € {A, B, C} (Fig. A.2). 

Each circle marker in this graph represents a particle j in the population P. In this figure, the fact that they are grouped 
into three distinct collectives is irrelevant. We are considering the total dispersion of the particles, measured by the sum of 
their distance from the population mean, represented by the horizontal green line. The center particle, directly above ‘B’, 
shows how we measure this quantity: for each particle, we draw a line like the one connecting the center particle to the 
mean. Then we sum up this different across all particles. Now, this would be a poor measure of dispersion, since extremely 
above-average and extremely below average particles would cancel out, making the dispersion seem small when it is really 
large. For that reason, these difference are typically squared to yield the formula for mean-squared distance: 


9 
MSD = ) (zj —2)’, 
j=l 


where z; is as above, Z is the population average z-level (here, 4.66).4° 

Our formula will be rather different, however, since we are not interested in variance, but in covariance. So, we must 
construct a second graph, similar to the first, except on the y-axis we have fitness w, instead of z. Then, instead of MSD, we 
will calculate: 


Cov(w, Z) = 


Ol| = 


9 
X (wi - w)(zj-2). 
j=1 


The second graph corresponds to the first term in the covariance expression, (wj — W), which measures our fitness disper- 
sion (Fig. A.3): 

Again, we calculate the dispersion by subtracting each j’s value from the population mean. Our expression for covariance 
now conveys useful information: if wj tends to move in the same direction as zj, then Cov(w, z) will be fairly large. If, 
on the other hand, they tend to move in opposite directions, then Cov(w, z) will be negative. If they exhibit little to no 
correspondence, then Cov(w, z) will be close to 0. 

Notice, further, that we have represented z as a fairly altruistic trait: within groups, those with higher z-level have fitness 
that is lower than the group average. Consider for example, j = 2 with a high z2 = 8, but a fitness level w, that is lower than 


40 The sum is often multiplied by 1 to give us an average, denoted S?, rather than a total that strictly increases as we add more particles. When used for 
estimation purposes, as in many statistics textbooks, the sum is divided by n — 1 rather than n for somewhat esoteric reasons involving the desire for an 
unbiased estimator. 
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w Population Dispersion from the Fitness Mean 
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Fig. A3. Population Dispersion of Fitness. 


either of its two collective members in A. Nevertheless, group A as a whole does quite well. Even the highly self-sacrificial 
particle j = 2 outperforms all members of the selfish group C. 

Now, what the decomposition technique shows is that we can break up these dispersion-measures into within- and 
between-group components. This is easily visualized. First consider between-collective dispersion (Fig. A.4). 

Now the centered vertical line measures the distance between the B-collective mean and the population average, repre- 
senting the center particle indirectly, only through its influence on the collective mean (the thick blue bar). In practice, we 
draw a similar graph for fitness w, calculating the distance between each collective’s mean and the population mean. Then 
we calculate the between-group covariance of z-level and w, denoting this value Cov;(w, z). We also run through a similar 
process to calculate within-group covariance between z-level and w. 

In Fig. A.5, we add up the differences between particles’ z-levels and the collective average, rather than the population 
average. Hence, we draw a line from the center particle to 5.33, its within-collective average, rather than to 4.66, the total 
population average. Again, we do something similar to find the dispersion of within-group fitness w. Now, clearly, differences 
within the group account for much of the total population dispersion. But, just as clearly, they do not account for all of 
it, because the within-collective averages are (almost by definition) closer to their within-group particles than they are 
to the whole population-wide gamut of particles. The rest of the population dispersion is captured by the dispersion of 
collective means from the population mean, visualized in Fig. A.4. The claim of the decomposition technique is that the 
total population dispersion can be captured by summing the average within-collective covariance and the between-collective 
covariance. 

With this intuition, let’s briefly formalize the decomposition claim. If we let i index particles within collectives k, then wig 
is the ith particle in the kth group, with i e {1,2,3} and k e {A, B, C}. For example, w2, denotes the fitness of the center par- 
ticle, i.e. the second particle in the B group. Similarly, z;, is the z-level of the ith particle in the kth group. If we write simply 
Wp, then we are denoting the within-collective mean of k. As above, Cov;(w, z) denotes the covariance between the within- 
collective average fitness and the within collective average z-level, expanded this is: Cov, (w, z) = 15C (w — W) (Zp — Z). 
On the other hand, we will use Cov; to denote a covariance within a group k, so that Cov; (w, z) = 1 Eaj (Wik — Wp) (Zik — Zg). 
With this notation, we can formally spell out the intuitive claim that total population dispersion is decomposable into 
between- and within-group components (Fig. A.5): 


Cov(w, Z) = Cov, (w, Z) + Ex[Cov;(w, z)] (8) 


The population-covariance between fitness w and z-trait is equal to the sum of (i) the covariance between collective 
fitness wọ and collective z-trait (i.e. the averages within the collective) and (ii) the mean of the within-collective covariances 
between fitness w and z-trait. 

Armed with Eq. (8), the bottom-up derivation of Eq. (7) is quite simple. 


Lemma. In the multilevel context, Eq. (2) is equivalent to the following: 


WAZ = Cov, (w, Z) + E,[Cov;(w, z) + E;[wAz]] 
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Dispersion of Collective Means (A, B, and C) 
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Fig. A4. z-Dispersion Between Groups. 


10 Dispersion from the Within-Collective Mean 
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Fig. A5. z-Dispersion Within Groups. 


Proof. Recall Eq. (2): 
WAZ = Cov(w, z) + E[wAz] 
= Covg(w, Z) + E,[Cov;(w, z)] + E[wAz] 


An easily derivable fact about expected value is that, for any partition of the population P, the expectation of the total 
population is equal to the expected value of the sum of the expected values of the partitions. Hence, 


K nk 
E[wAz] = E,[E;[wAz]] = a : Do wir AzZix), 


n 
kat K ia 


where K is the total number of collectives (elements in the partition), ng is the total number of particles in collective K, wig 
is the fitness level of the ith particle in the kth collective, and z;, is the z-level of the ith particle in the kth collective. 
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10; Regression of Fitness on z-Level 
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Fig. A6. Regression of w on Z. 


Inserting this equation into the above expression... 


WAZ = Cov, (w, Z) + E,[Cov;(w, Z)] + E,[Ej[wAz]] 
= Cov; (W, Z) + E,[Cov;(w, z) + E;|[wAz]] 


0o 


In contrast to the top-down derivation, this one began with particles, not considering whether these particles are them- 
selves collectives comprised of lower level particles. They may be. We could then apply the same top-down derivation to 
these particles, identifying them as intermediary collectives, made up of particles but also constituting particles for higher- 
level collectives. Evolutionary systems are thus seen to be nested processes, teeming with activity at each level, an activity 
that often expresses itself through emergent coherence and unity at higher levels. 


A3. Linear regression 


Linear Regression Given a sample of data points, a linear regression is a line of best fit, one which minimizes the 
summed distance of points from the line.*! The equation for this line of best fit is represented as... 


w= fot Piz+e, (9) 


where w is the variable we are running the regression of and z is the variable this regression is on. Bg and f are the 
constants chosen to minimize the summed distance of data points from the line. Since the model will not be perfect, € is 
an error term, which we will assume exhibits no systematic bias (i.e. the mean Efe] = 0). 

We can visualize this line with the data we posited above when analyzing the within- and between-collective compo- 
nents of covariance (Fig. A.6): 

This figure plots the various particles from the population discussed above and fits a line that minimizes the summed 
distance between the particles and this line. In this case, we get Bg = 4.081, 6; = .076, and hence w = .076z + 4.081. This 
tells us that there is a slight positive correlation between fitness and the z trait, despite that fact that this trait requires self- 
sacrifice on the part of the particles that possesses it. In other words, selection at the collective-level is slightly overpowering 
selection at the particle-level in this population. 

But how did we calculate Bg and 61? There are several methods (and computer programs), but the most instructive is 
to consider the expected values of our variables and to solve for Bg and fy. 


w = Po + Bizt+e 
E[w] = Efo + 1z + €] 
= E[ fo] + E[£1z] + Efe] 


41 Where distance is measured by the square of the difference. 
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= Bo + BiE[z]+0 
Therefore, Bo = E[w] — Bi E[z] 
Next, consider the covariance between w and z: 
Cov(z, w) = Cov(z, Bo + 1z +€) 
= Cov(z, Bo) + Cov (z, Bz) + Cov(z, €) (From properties of coviariance) 
= ByCov(z, 1) + B,Cov(z, z) + Cov(z, €) 
= 0+ 6,Var(z) +0 


This last step follows from the independence of z, €, and the fact that Cov(X, X) = Var(X). 
We therefore conclude: 
Cov(z, w) 
=E — B-E = 
Bo = E[w] — £:E[z] Bi Var 


The second equation in (10) shows how we derive the equations that begin Section 2.3. 


(10) 


A4. The price equation and partial regression coefficients 


Recall the basic form of the Price Equation: 
wAZ = Cov(w;, Zi) + E[w; Az] (11) 


Now, let us consider fitness of individual i, w;, as depending partly on the trait z; and partly on the average level of the 
altruistic train within the group, denoted zj. Given that it is an altruistic trait, fitness will be undermined by the possession 
of this trait, but enhanced by others possessing it. And since i's possession of the trait will be correlated with the group’s 
possession of this trait, we need a way to separate out these two effects. A simple regression equation can do this for us. Let 
Bwiz;-2; be the partial regression coefficient for w; on z;. That is, Êwizizj represents the strength of the effect that changing z; 
has on w; holding zj constant. Similarly, Bwiz;-2; will represent the partial regression coefficient for w; on zj. That is, Bwiz,-z; 
represents the strength of the effect that changing zj has on w; holding z; constant. As is standard, we will also include the 
y-intercept term fp and an error term €: 


Wi = Bo + Bwiz.2;2i + Bwiz,z.2j + € (12) 
Now, take the expression in 12 and substitute it into expression 11. Let Az = 0, since we are not interested in drift, but 
selection. So, setting E[w;Az] = 0 and substituting... 
WAZ = Cov (Bo + Bwjz,z)2i + Bwiz,z2Zj +€, Zi) (13) 
Relevant Covariance Rules 
Cov(X +Y, Z) = Cov(X, Z) + Cov(Y, Z) 
Cov(ax, Z) = aCov (X, Z) 
Cov(X, X) = Var(X) 
Cov(X, Y) = ByxVar(X) 


Applying covariance rules to Eq. (13), we get: 
WAZ = wizz; Var (zi) + wizz Cov (Zj, Zi) 


Because Cov(Z;,Z;) = Cov (zi, Zj) = Pzjz Var (zi)... 


wAZz = Bw,z;2,V ar (Zi) T Bwiz;zi bza Var (zi) 
= (Bwiz.2z; + Bzz Êwiz;z)Var (Zi) 0) 


To understand the meaning of this last expression, we can borrow from the analysis of Frank (1998) to reconstruct the 
complicated coeffiecient term in (O). Let the regression coefficient By,z, denote exactly what we want to know: the total, 
non-decomposed, statistical (i.e. not necessarily causal) effect of a trait z; on the organism's fitness, w;. Frank points out 
that this regression coefficient contains two separable components. One is the direct, causal effect of z; on wj, ignoring any 
indirect effects that result from the group. We have been representing this component with the familiar partial regression 
coefficient Bwiz;-2;- The second component consists of (i) the statistical effect of trait z; on the group-level of trait z, denoted 
Zj, multiplied by (ii) the causal effect of the group level zj on an organism’s fitness w;. We denote (i) with the regression 
coefficient Bz jz; and (ii) with the partial regression coefficient Bwz,z;- This can be visualized by the following diagram, 
adapted from Frank (1998, 52) (Fig. A.7): 

As this diagram indicates... 


(X) Bw, = Bwiziz; + Bzz Bw.z;-2; 
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Fig. A7. Relating partial regression coefficients and fitness. 


Comparing (O) and (X), we see that the right hand side of (X) is equal to the coefficient in front of Var(z;) in (0). This 
is fantastic, because we now have some intuitive sense of what the the expression (O) actually signifies. The adaptiveness, 
or tendency to increase, of a trait z is regulated partly by the effect it has on the organism’s fitness, but also partly by the 
product of (i) how likely an organism with that trait is to be surrounded by other organisms with that trait and (ii) the 
impact on the organism of being in a group with the trait z.4? It’s nice to understand what the mathematics means, but we 
should also ask what we can do with it. 

Henrich (2004) is able to get some serious mileage out of the expression (O). Since Var(z;) > 0 as a mathematical fact, 
this implies that a trait will be selected for, i.e. wAZ > 0, only if we have: 


Bw,2:2; ag Bzz wizz >0 (**) 


If we assume we are talking about an altruistic trait, then we know the following facts: 


Bwiz:z) <0 
Bw,2;-2; >0 


Given these two facts, our prediction of whether an altruistic trait will spread depends upon a crucial structural feature 
of the population. In order to ensure that condition (*«) is satisfied, we would like Bz; to be large and positive. In other 
words, going back to the intuitive meaning of the expression, an altruistic trait is more likely to be selected when altruists 
are capable of bunching together. This point is of fundamental importance for understanding multilevel selection and the 
evolution of altruism, so it bears repeating: in order for an altruistic trait to evolve (through selection), altruists must have 
some mechanism(s) for excluding egoists from their network of interaction or, equivalently, of converting egoists within 
their network into altruists. 
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